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00 , for rij<3 
(8) 

8ij , forRi/^<rig<Rig 

0 . for Rig < Tig 
I Q where ey were the pairwise interaction parameters, rjg was the distance between 

chain beads i and j, E"^^ =3kT was a constant repulsive term operating at very short 
distances, and and Rjg were the cut-off values that depend on amino acid type. 
The values of these cut-off parameters were provided in Table Vn. 



15 Table VH. Compilation of pairwise cut-off distances for pairwise interactions 



Ai 


Aj 


Rij-^CA) 


Rig (A) 


Small" 


Small 


4.35'' 


5.97 


Large*^ 


Large 


4.83 


6.80 


Other 


Combinations'* 


4.57 


6.32 



20 

* Small amino acids are: Gly, Ala, Ser, Cys. 

^ This value corresponds to the excluded volume radius of three lattice units; therefore, for pairs of 
small amino acids, the soft-core envelope does not exist 
' Large amino acids arc Phe, Tyr, Trp. 

Small-large, other (than small or large)-large, other-small. 

25 

The interaction parameters depended not only on amino acid identity, but 
also on their positions in the polypeptide chain because the derivation of the 
potentials also used evolutionary information. A more detailed description of the 
derivation of these potentials is found elsewhere.'^ The total energy contribution 

30 

from the pairwise interactions was therefore calculated as follows: 

Epair=2SEij (9) 
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where the summations were over all j>i pairs of residues. 



5 



5. 



Multibody potentials 



The hydrophobic interactions in this model were partially accounted for by 
pairwise interactions between residues; however, this was not sufficient to generate 
well packed proteins. Thus, a surface exposure based statistical potential was 
developed according to the following scheme: Each model residue was assigned 24 
surface contact points. A specific subset of these contact points became occupied 
upon contact with other residues. The main chain Ca atoms contributed separately 
to the coverage of a given residue. The positions of the Ca atom could be quite well 
approximated given the positions of three consecutive side chain beads. Some 
contact points could be multiply occupied. The fraction of non-occupied surface 
points defined the exposed fraction of a given side chain. Potentials could be 
derived from a statistical analysis of the protein structures for which the solvent 
exposure had been determined on the atomic level. The total surface energy was 
computed as follows: 



where aj was the covered fraction of the residue Aj and Eb(Ai, aj) was the statistical 
potential when amino acid type A had a; of its surface points occupied, i.e., the 
covered fraction of its surface was equal to ai/24. 

Studying the distribution of inter-residue contacts in globular proteins, 
various amino acids have been found to have different tendencies to pack in a 
parallel or antiparallel fashion. A contact between residues i and j was considered to 
be "parallel" when (Vi.i - Vi)»(vj.r Vj)>0, and "antiparallel" otherwise. Moreover, for 
a given residue there were strong correlations between the number of parallel and 
antiparallel contacts given the total number of contacts. Due to the reduced 
character of this model, the other contributions to the force field did not properly 
account for such effects. Therefore, the model force field was supplemented by the 
following multibody potential: 



20 



Esurface " ^ Eb(Ai, aj) 



(10) 
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Emuiti = 2 Em( A,np,na) (11) 
5 where Em(A,np,na) was the value of the statistical potential for residue type A 

having Hp parallel and na antiparallel contacts. The reference state was a random 
distribution of contacts. The values along particular diagonals (np + na =nc) were 
normalized such that the lowest energy for a diagonal was exactly equal to the value 
of statistical potentials derived from the distribution of the total number of contacts 
20 lie for a given type of residue. 

6. Total intrinsic conformational energy 

The total internal conformational energy of the model chain was equal to: 

Etotal = EstifH"Emap"H).875EH-bond'H)J5Eshort+l'25Epair"H).5Esurface+0-5En,u!ti 

15 ('2' 

with the value of generic parameter Sgen = 1 kT. 

The relative scaling of various potentials was adjusted by a trial and error 
method in ab initio folding experiments performed for a few selected small proteins, 
1 fiia, the B domain of protein A and the Bl domain of protein G. The objective 

2Q was to maintain low secondary stmcture content in the random coiled state and 

dense packing with a proper level of secondary structure in the collapsed globular 
state. For instance, the small 56-residue cx/p protein G domain folded ab initio in 
about 30% of simulated annealing Monte Carlo simulations to a native-like structure 
with an RMSD from native in the range of 4 A. The majority of the remaining 

25 misfolded conformations had native-like secondary structures, but they had 
topological errors, usually involving the wrong order of ^-strands in the four- 
member P-sheet. The model is not sensitive to small variations in these scaling 
parameters. 

30 Building the starting lattice model 

A separate algorithm was used to build an initial lattice model from a given 
target sequence alignment to a template structure. Such alignments contain gaps and 
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